Population

Let the population be of size $N$ . These numerical values will be denoted by $x_1; x_2; ...; x_N$ .
$\mu = \frac{1}{N}\sum x_i$
$\tau = N\mu$
$\sigma^2 = \frac{1}{N} \sum(x_i - \mu)^2 = \frac{1}{N}\sum x_i^2 -\mu^2$

Simple Random Sampling

We will denote the sample size by $n$ ( $n$ is less than $N$ ) and the values of the sample members by $X_1; X_2; ...; Xn$ .
$\hat{\mu} = \bar{X} = \frac{1}{n} \sum X_i$
$\hat{\tau} = N\bar{X}$

Sampling Mean

$E(\bar{X} ) = \mu$
$Var(\bar{X}) = \frac{\sigma^2} {n}$
if sampling without replacement（default）:
- $Var(\bar{X}) = \frac{\sigma^2}{n} (1- \frac{n-1}{N-1})$
- $\sigma_{\bar{X}} \approx \frac{\sigma}{\sqrt{n}}$
- it is easy to see, the larger $n$ is, the small $\sigma_{\hat{X}}$ is, hence more accuracy

Estimation of the population $\sigma$

$\sigma^2$

$\hat{\sigma}^2 = \frac{1}{n} \sum X_i^2 - \bar{X}^2$
$E(\hat{\sigma}^2) = \sigma^2 (\frac{n-1}{n}) \frac{N}{N-1}$ , this is a biased estimate of $\sigma^2$
so the unbiased estimate of $\sigma^2$ is $\frac{1}{n-1}\frac{N-1}{N} \sum (X_i - \bar{X})^2$

$Var(\bar{X})$

$Var(\bar{X}) = \frac{\sigma^2}{n} (1- \frac{n-1}{N-1})$
An unbiased estimate of $Var(\bar{X})$ $V a r (\overset{ˉ}{X})$ is :
- $s_{\bar{X}}^2 = \frac{\hat{\sigma}}{n} (\frac{n}{n-1}) (\frac{N-1}{N}) (\frac{N-n}{N-1})$
- $E(s_{\bar{X}}^2) = Var(\bar{X})$
注意，这里是总体均值的估计量 $\bar{X}$ ，其方差的无偏估计，目的是看该估计量的波动大小。

Random Sampling

Direct model

Pseudo-random number generation

how to get random distribution in computer?
using uniform distribution to get

Box-muller algorithm

Aliasing sample

在计算机处理中，找范围需要 $\log n$ 的时间，而别名采样只需要常数时间。

Indirect model

Reject sampling

尽量选择最小的能包含原分布的分布，能使得拒绝的次数减少。
证明

Stratified sampling

population $\mu$ and $\sigma^2$

The population mean and variance of the lth stratum are denoted by $\mu_l$ and $\sigma^2_l$
$\mu = \frac{1}{N}\sum_{l=1}^{L}\sum_{i=1}^{N_i}x_{il} = \sum_{l=1}^{L} W_l\mu_l$

sample $\mu$ and $\sigma^2$

$l$ $l$ strata mean:
- $\bar{X_l} = \frac{1}{n_l}\sum\limits_{i=1}^{n_l}X_{il}$
$l$ $l$ strata variance:
- $s_l^2 = \frac{1}{n_l-1}\sum\limits_{i=1}^{n_l}(X_{il} - \bar{X_l})^2$
the obvious estimate of $\mu$ $μ$
- $\bar{X_s} = \sum\limits_{l=1}^{L}\frac{N_l\bar{X_l}}{N} = \sum\limits_{l=1}^{L}W_l \bar X_l$
- it is an unbiased estimator of $\mu$
The variance of the stratied sample mean is given by:
- $Var(\bar{X_s}) = \sum\limits_{l=1}^{L}W_l^2\frac{\sigma_l^2}{n_l}(1 - \frac{n_l-1}{N_l-1})$
If the sampling fractions within all strata are small:
- $Var(\bar{X_s}) \approx \sum\limits_{l=1}^{L}W_l^2\frac{\sigma_l^2}{n_l}$
using $s_l^2$ $s_{l}^{2}$ to replace $\sigma_l^2$ $σ_{l}^{2}$ , the estimator of $Var(\bar{X_s})$ $V a r (\overset{ˉ}{X_{s}})$ :
- $s_{\bar{X_s}}^2 = \sum\limits_{l=1}^L W_l^2 \frac{s_l^2}{n_l}(1-\frac{n_l-1}{N_l-1})$

estimator of population

$E(T_s) = \tau$
$Var(T_s) = N^2Var(\bar{X_s}) = N^2\sum\limits_{l=1}^{L}W_l^2\frac{\sigma_l^2}{n_l}(1 - \frac{n_l-1}{N_l-1}) = \sum\limits_{l=1}^{L}N_l^2\frac{\sigma_l^2}{n_l}(1 - \frac{n_l-1}{N_l-1})$

methods of allocation

ideal

The sample sizes $n_1;... ;n_L$ that minimize $Var (\bar{X_s})$ subject to the constraint $n_1 + .... + n_L = n$ are given by:

$n_l = n \frac{W_l\sigma_l}{\sum_{l=1}^{L}W_k\sigma_k}$

可以发现，每一层的采样个数取决于某一层的方差和个数。

the stratified estimate using the optimal allocations as given in $n_l$ and neglecting the finite population correction:

$Var(\bar{X_{so}}) = \frac{(\sum W_l \sigma _l)^2}{n}$

$\sigma_l$ unknown

if $n_l = nW_l$ :

$Var(\bar{X_{sp}}) = \sum\limits_{l=1}^L W_l^2\frac{\sigma^2_l}{n_l} = \frac{1}{n} \sum\limits_{l=1}^LW_l\sigma^2_l$

it is easy to see that the different of $Var(\bar{X_{sp}})$ and $Var(\bar{X_{so}})$ :

$Var(\bar{X_{sp}}) - Var(\bar{X_{so}}) = \frac{1}{n}\sum\limits_{l=1}^L W_l(\sigma_l - \bar{\sigma})^2$

where $\bar{\sigma} = \sum\limits_{l=1}^L W_l\sigma_l$

MCMC

We cannot sample directly from the target distribution $p(x)$ in the integral $f (x)p(x)dx$ .

Create a Markov chain whose transition matrix does not depend on the normalization term.
Make sure the chain has a stationary distribution $\pi(i)$ and it is equal to the target distribution.
After sufficient number of iterations, the chain will converge the stationary distribution.

Markov Chain and Random Walk

A random process is called a Markov Process if conditional on the current state, its future is independent of its past.
$P(X_{n+1} = x_{n+1}| X_0 = x_0,...,X_n = x_n) = P(X_{n+1} = x_{n+1}|X_{n} = x_n)$
The term Markov property refers to the memoryless property of a stochastic process.
A Markov chain $X_t$ $X_{t}$ is said to be time homogeneous if $P(X_{s+t} = j | X_s = i)$ $P (X_{s + t} = j ∣ X_{s} = i)$ is independent of s. When this holds, putting s = 0 gives:
- $P(X_{s+t} = j| X_s = i) = P(X_t = j| X_0 = i)$
If moreover $P(X_{n+1} = j |X_n = i) = P_{i}j$ is independent of $n$ , then $X$ is said to be time homogeneous Markov chain.

Transition matrix

The transition matrix is an $N\times N$ matrix of nonnegative entries such that the sum over each row of $P^{(t)}$ is 1, since $\forall n$ :

$\sum_{y}P_{x,y}^{(t+1)} = \sum_yP[X_{t+1} = y| X_t = x] =1$

this is not a symmetric matrix
using normalized laplacian matrix to calculate the eigenvalues

State distribution

Let $\pi(t)$ be the state distribution of the chain at time t, that $\pi^{(t)}_x =P[X_t = x]$ .

For a finite chain, $\pi^{(t)}$ is a vector of $N$ nonnegative entries such that $\sum_x\pi_x^{(t)} =1$ . Then, it holds that $\pi^{(t+1)} = \pi^{(t)}P^{(t+1)}$ . We apply the law of total probability:

$\pi_y^{t+1} = \sum_xP[X_{t+1} = y| X_t = x]P[X_t = x] = \sum_x\pi_x^{(t)}P_{x,y}^{(t+1)}$

A stationary distribution of a finite Markov chain with transition matrix $P$ is a probability distribution $\pi$ such that $\pi P = \pi$ .

Irreducibility

State $y$ is accessible from state $x$ if it is possible for the chain to visit state $y$ if the chain starts in state $x$ , in other words, $P^n{(x,y)} > 0, \forall n$ .

State $x$ communicates with state $y$ if $y$ is accessible from $x$ and $x$ is accessible from $y$ . We say that the Markov chain is irreducible if all pairs of states communicates.

Aperiodicity

The period of a state $x$ is the greatest common divisor (gcd), such that $d_x = gcd\{n|(P^n)x,x > 0\}$ . A state is aperiodic if its period is 1. A Markov chain is aperiodic if all its states are aperiodic.

Theorem

If the states $x$ and $y$ communicate, then $d_x = d_y$ .
We have $(P^n)x,x = 0$ if $n \mod(d_x ) \ne 0$ .

Detailed Balance Condition

Let $X_0,X_1, ...,$ be an aperiodic Markov chain with transition matrix $P$ and distribution $\pi$ . If the following condition holds,

$\pi(i)P_{ij} = \pi(j)P_{(ji)}$

then $\pi(x)$ is the stationary distribution of the Markov chain. The above equation is called the detailed balance condition.

MCMC algorithm

Revising the Markov Chain

How to choose $\alpha(i , j)$ such that $\pi(i)P_{ij} \alpha (i , j) = \pi (j)P_{ji} \alpha(j,i)$ :
In terms of symmetry, we simply choose $\alpha (i , j) = \pi(j)P_{ji} ,\alpha (j,i) = \pi(i)P_{ij}$ . its easy to see the equation got.
这就是MCMC方法的基本思想：既然马尔科夫链可以收敛于一个平稳分布，如果这个分布恰好是我们需要采样的分布，那么当马尔科夫链在第n步收敛之后，其后续不断生成的序列X(n), X(n+1), ... 就可以当做是采样的样本。
我们构造一个 $Q_{ij}$ 使得满足细致平衡条件，计算转移概率时，对 $Q$ 的采样可以处理成先对 $P$ 采样（可以用刚才说的均匀分布，或者正态分布也有成熟的方法，即box muller变换），然后把 $\alpha$ 当成一个接受率的概念，随机采样之后看是否到达 $\alpha(i,j) = \pi(j)P(j,i)$ 的条件，满足条件说明可以转移。

MCMC Sampling Algorithm

现在我们的目的是要按照转移矩阵 $Q$ 进行跳转，
3步代表先用原来的 $P$ 进行跳转，跳转之后的值假设是 $x$ 。
4步的接受率就是将前后两个状态带入 $\alpha(i,j) = \pi(j)P(j,i)$ ，如果要跳转，其概率应该为 $P(x|x^{(i)})\alpha(i,j)$ ，比以前减小了，且这个概率值恰好为新的 $Q$ 转移矩阵的概率值，满足细致平衡条件。

Metropolis-Hastings algorithm

对于MCMC方法，唯一不好的地方是我们需要考虑 $\alpha$ ，当 $\alpha$ 很小的时候，我们需要进行很多次抽样才能前进一步，效率很低，因此，我们是否可以改进一下，使得 $\alpha$ 增大？
对于目前第 $x^{(i)}$ $x^{(i)}$ ，我们使用平稳分布的条件：
- $\pi(x^{(i)})P(x^{(j)}|x^{(ij)}) \alpha(i,j) = \pi(x^{(j)})P(x^{(i)}| x^{(j)})\alpha(j,i)$
我们只需要将右边的 $\alpha$ $α$ 除到左边，或者将左边的 $\alpha$ $α$ 除到右边，这样就增大了接受率。因此，现在的接受率为：
- $\alpha(i,j) = \min \{1,\frac{\pi(x)P(x^{(i)}| x)}{P(x^{(i)})P(x|x^{(i)})}\}$
因此，算法会稍微改变一点：

Gibbs Sampling

提高接受率是一种很好的办法，那么有没有一种方法能保证接受率为1呢？

Intuition

考虑一个二维分布：
- $P(x_1,y_1)P(y_2|x_1) = P(x_1) P(y_1|x_1)P(y_2|x_1)$
- $P(x_1,y_2)P(y_1|x_1) = P(x_1)P(y_2|x_1)P(y_1|x_1)$
不难发现：
- $P(x_1,y_1)P(y_2|x_1) =P(x_1,y_2)P(y_1|x_1)$
我们可以发现，若沿着坐标轴移动，其分布满足平稳分布。
因此：
很容易拓展到多维情形

优点

不需要事先知道联合概率，用于只知道条件概率的情形
每次接受率都为1
达到平稳分布后，采样点逼近联合概率

Survey Sampling

Population

Simple Random Sampling

Sampling Mean

Estimation of the population $\sigma$

$\sigma^2$

$Var(\bar{X})$

Random Sampling

Direct model

Pseudo-random number generation

Box-muller algorithm

Aliasing sample

Indirect model

Reject sampling

Stratified sampling

population $\mu$ and $\sigma^2$

sample $\mu$ and $\sigma^2$

estimator of population

methods of allocation

ideal

$\sigma_l$ unknown

MCMC

Markov Chain and Random Walk

Transition matrix

State distribution

Irreducibility

Aperiodicity

Detailed Balance Condition

MCMC algorithm

Revising the Markov Chain

MCMC Sampling Algorithm

Metropolis-Hastings algorithm

Gibbs Sampling

Intuition

优点

results matching ""

No results matching ""

Population

Simple Random Sampling

Sampling Mean

Estimation of the population σ\sigmaσ

σ2\sigma^2σ2

Var(Xˉ)Var(\bar{X})Var(Xˉ)

Random Sampling

Direct model

Pseudo-random number generation

Box-muller algorithm

Aliasing sample

Indirect model

Reject sampling

Stratified sampling

population μ\muμ and σ2\sigma^2σ2

sample μ\muμ and σ2\sigma^2σ2

estimator of population

methods of allocation

ideal

σl\sigma_lσl​ unknown

MCMC

Markov Chain and Random Walk

Transition matrix

State distribution

Irreducibility

Aperiodicity

Detailed Balance Condition

MCMC algorithm

Revising the Markov Chain

MCMC Sampling Algorithm

Metropolis-Hastings algorithm

Gibbs Sampling

Intuition

优点

results matching ""

No results matching ""

Estimation of the population $\sigma$

$\sigma^2$

$Var(\bar{X})$

population $\mu$ and $\sigma^2$

sample $\mu$ and $\sigma^2$

$\sigma_l$ unknown